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Abstract 



Tries are popular data structures for storing a set of strings, where common prefixes are 
represented by common root-to-node paths. Over fifty years of usage have produced many 
variants and implementations to overcome some of their limitations. We explore new succinct 
representations of path-decomposed tries and experimentally evaluate the corresponding 
reduction in space usage and memory latency, comparing with the state of the art. We 
study two cases of applications: (1) a compressed dictionary for (compressed) strings, and 
(2) & monotone minimal perfect hash for strings that preserves their lexicographic order. 

For (1), we obtain data structures that outperform other state-of-the-art compressed 
dictionaries in space efficiency, while obtaining predictable query times that are competitive 
with data structures preferred by the practitioners. In (2), our tries perform several times 
faster than other trie-based monotone perfect hash functions, while occupying nearly the 
same space. 



Part of the work done while the author was an intern at Microsoft Research, Cambridge. 



1 Introduction 



Tries are a widely used data structure that turn a string set into a digital search tree. Several 
operations can be supported, such as mapping the strings to integers, reprieving a string from 
the trie, performing prefix searches, and many others. Thanks to their simplicity and function- 
ality, they have enjoyed a remarkable popularity in a number of fields — Computational Biology, 
Data Compression, Data Mining, Information Retrieval, Natural Language Processing, Network 
Routing, Pattern Matching, Text Processing, and Web applications, to name a few — motivating 
the significant effort spent in the variety of their implementations over the last fifty years. 

However their simplicity comes at a cost: as most tree structures, they generally suffer poor 
locality of reference due to pointer-chasing. This effect is amplified when using space efficient 
representations of tries, where performing any basic navigational operation, such as visiting a 
child, requires accessing possibly several directories, usually with unpredictable memory access 
patterns. Tries are particularly affected as they are unbalanced structures: the height can be 
in the order of the number of strings in the set. Furthermore, space savings are achieved only 
by exploiting the common prefixes in the string set, while it is not clear how to compress their 
nodes and their labels without incurring an unreasonable overhead in the running time. 

In this paper, we experiment how path decompositions of tries help on both the above men- 
tioned issues, inspired by the work presented in |18| . By using a centroid path decomposition, 
the height is guaranteed to be logarithmic, reducing dramatically the number of cache misses in 
a traversal; besides, for any path decomposition the labels can be laid out in a way that enables 
efficient compression and decompression of a label in a sequential fashion. 

We keep two main goals in mind: {i) reduce the space requirement, and (ii) guarantee fast 
query times using algorithms that exploit the memory hierarchy. In our algorithm engineering 
design, we follow some guidelines: (a) the proposed algorithms and data structures should be 
the simplest possible to ensure reproducibility of the results, while the performance should be 
similar to or better than what is available in the state of the art. (b) The proposed techniques 
should possibly lay on a theoretical ground, (c) The theoretical complexity of some operations 
is allowed to be worse than that known for the best solutions when there is a clear experimental 
benefit^, since we seek for the best performance in practice. 

The literature about space-efficient and cache-efficient tries is vast. Several papers address 
the issue of a cache-friendly access to a set of strings supporting prefix search, e.g. [H [6l [11] [T7] 
but they do not deal with space issues except [6], which introduces an elegant variant of front 
coding. Other papers aiming at succinct labeled trees and compressed data structures for strings, 
e.g. [21 [5j El El [191 ESI [28], support powerful operations — such as path queries — and are very 
good in compressing data, but they do not exploit the memory hierarchy. Few papers |12| I18| 
combine (nearly) optimal information theoretic bounds for space occupancy with good cache 
efficient bounds, but no experimental analysis is performed. More references on compressed 
string dictionaries can be found in |1U] , 

The paper is organized as follows. We apply our path decomposition ideas to string dictio- 
naries in Section [2] and to monotone perfect hash functions (hollow tries) in Section [3j showing 
that it is possible to improve their performance with a very small space overhead. In Section [H 
we present some optimizations to the Range Min-Max tree |281 [3], that we use to support fast 
operations on balanced parentheses, improving both in space and time on the existing imple- 
mentations [3]. Our experimental results are discussed in Section [5j where our implementations 
compare very favorably to some of the best implementations. We provide the source code at 
|http : / /github . com/ot/path_decomposed_tries| for the reader interested in further 

1 For example, it is folklore that a sequential scan of a small sorted set of keys is faster than a binary search 
because the former method is very friendly with branch prediction and cache pre-fetching of modern machines. 
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comparisons. 

1.1 Background and tools 

In the following we make extensive use of compacted tries and basic succinct data structures. 
Compacted tries. To fix the notation we recall quickly the definition of compacted tries. We 
build recursively the trie in the following way. Basis: The compacted trie of a single string is a 
node whose label is the string. Inductive step: Given a nonempty string set S, the root of the 
tree is labeled with the longest common prefix a (possibly empty) of the strings in S. For each 
character b such that the set Sb = {f3\abf3 E S} is nonempty, the compacted trie built on Si, is 
attached to the root as a child. The edge is labeled with the branching character b. The length 
of the label a is also called the skip, and denoted with 5. Unless otherwise specified, we will use 
trie to indicate a compacted trie in the rest of the paper. 

Rank and Select operations. Given a bitvector X, we can define the following operations: 
Rankfe(i) returns the number of occurrences of bit b € {0,1} in the first i positions in X; 
Selectb(i) returns the position of the i-th occurrence of bit b in X. These operations can be 
supported in constant time by adding a negligible redundancy to the bitvector |13| 12 lj . 
Elias-Fano encoding. The Elias-Fano representation |15[ PTE] is an encoding scheme to repre- 
sent a non-decreasing sequence of m integers in [0, n) occupying 2m + m |~log ^] + o(m) bits, 
while supporting constant-time access to the i-th integer. The scheme is very simple and elegant, 
and efficient implementations are described in |20 [ [27 1 [33], 

Balanced parentheses (BP). In a sequence of balanced parentheses each open parenthesis ( 
can be associated to its mate ) . Operations FindClose and FindOpen can be defined, which 
find the mate of respectively an open and closed parenthesis. The sequences can be represented 
as bitvectors, where 1 represents ( and represents ) , and by adding a negligible redundancy it 
is possible to support the above defined operations in constant or nearly-constant time |21| [25] . 

2 String Dictionaries 

In this section we describe an implementation of string dictionaries using path-decomposed tries. 
A string dictionary is a data structure on a string set S C X* that supports the following 
operations: 

• Lookup(s) returns —1 if s ^ S or an unique identifier in [0, \S\) otherwise. 

• Access(i) retrieves the string with identifier i; note that Access(Lookup(s)) = s if s € S. 

Path decomposition. Our string dictionaries, inspired by the approach described in |18] . are 
based on path decompositions of the trie built on S (recall that we use trie to indicate compacted 
trie in the rest of the paper). A path decomposition T c of a trie T is a tree where each node in 
T c represents a path in T. It is defined recursively in the following way: a root-to-leaf path in 
T is chosen and represented by the root node in T c . The same procedure is applied recursively 
to the sub-tries hanging off the chosen path, and the obtained trees become the children of the 
root. Note that in the above procedure the order of the decomposed sub-tries as children of the 
root is arbitrary. Unlike [18J, that arranges the sub-tries in lexicographic order, we arrange them 
in bottom-to-top left-to-right order since this simplifies the traversal. Figure [JJ shows a path in 
T and its resulting node in T c . 

There is a one-to-one correspondence on the paths: root-to- node paths in T c correspond to 
root-to- leaf paths in the trie T, hence to strings in S. This implies also that T c has exactly |<S| 
nodes, and the height of T c can not be larger than that of T. Different strategies in choosing 
the paths in the decomposition give rise to different properties. We describe two such strategies. 

• Leftmost path: Always choose the leftmost child. 
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Figure 1: Path decomposition of a trie. The at denote the labels of the trie nodes, Cj and bi the 
branching characters (depending on whether they are on the path or not). 



• Heavy path: Always choose the heavy child, i.e. the one whose sub-trie has the most leaves 
(arbitrarily breaking ties). This is the strategy adopted in |18| and borrowed from |29j . 

OBSERVATION 2.1 If the leftmost path is used in the path decomposition, the depth-first order 
of the nodes in T c is equal to the depth-first order of their corresponding leaves in T ■ Hence if 
T is lexicographically ordered, so is T c . We call it a lexicographic path decomposition. 

Observation 2.2 If the heavy path is used in the path decomposition, the height of the result- 
ing tree is bounded by 0(log |<S|). We call such a decomposition a centroid path decomposition. 

The two strategies enable a time/functionality trade-off: a lexicographic path decomposition 
guarantees that the indices returned by the Lookup are lexicographic, at cost of a potentially 
linear height of the tree (but never higher than the trie). On the other hand, if the order of the 
indices is irrelevant, the centroid path decomposition gives logarithmic guarantees! 

We exploit a crucial property of path decompositions: since each node in T c corresponds to 
a node-to-leaf path in T, the concatenation of the labels in the node-to-leaf path corresponds to 
a suffix of a string in S. To simulate a traversal of T using T c we only need to scan sequentially 
character-by-character the label of each node until we find the needed child node. Hence, any 
representation of the labels that supports sequential access (simpler than random access) is 
sufficient. Besides being cache- friendly, as we will see in the next section, this allows an efficient 
compression of the labels. 

Trie representation. We represent the path-decomposed trie with three sequences (see Fig- 
ure [TJ containing an example for the root node) : 

• The bitvector BP encodes the trie topology using DFUDS [7]: each node is represented as 
a run of (s of length the degree of the node, followed by a single ) ; the node representations 
are then concatenated in depth-first order. 

• The array B contains the branching characters of each node: they are written in reverse 
order per node, and then concatenated in depth-first order. Note that the branching 
characters are in one-to-one correspondence with the (s of BP. 

2 In |18j the authors show how to have lexicographic indices in a centroid path-decomposed trie, using secondary 
support structures and arranging the nodes in a different order. The navigational operations are noticeably more 
complex, and require more powerful primitives on the underlying succinct tree, in particular for Access. 
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• The sequence L contains the labels of each node. We recall that each label represents a 
path in the trie. We encode the path augmenting the alphabet £ with |E| — 1 special 
characters, S' = SU {1, 2, . . . , — 1}, alternating the label and the branching char of 
each node in the path with the number of sub-tries hanging off that node, encoded with 
the new special characters. We concatenate the representations of the labels in depth-first 
order in the sequence L, so that each label is in correspondence with a ) in BP. Note that 
the labels are represented in a larger alphabet; we will show later how to encode them. 
Also, since the label representations are variable-sized, we encode their endpoints using an 
Elias-Fano sequence. 

Trie operations. To implement Lookup we start from the root and begin scanning its label. 
If the character is a special character, we add it to an accumulator, otherwise we check for a 
mismatch with the string at the current position. When there is a mismatch, the accumulator 
indicates the range of children of the root (and thus of branching characters) that branch from 
that point in the path in the original trie. Hence we can find the right branching character (or 
conclude that that there is none, i.e. the string was not in the set) and then the child where to 
jump. We then proceed recursively until the string is fully traversed or it cannot be extended 
further: the index returned is the value of Rank) for the final node in the former case (i.e. the 
depth-first index of that node), or —1 in the latter case. Note that it is possible to avoid all 
the Rank calls needed to access L and B by using the standard trick of double-counting, i.e. 
exploiting the observation that between two mates there is an equal number of (s and )s. 

Access is performed similarly but in a bottom-up fashion. The node position is obtained 
from the index through a Select) , then the path is reconstructed jumping to the parent until the 
node is reached. Since we know for each node which child we came from, we can scan its label 
until the sum of special characters encountered exceeds the child index. The normal characters 
seen during the scan are appended to the string to be returned. 

Time complexity. For the Lookup, for each node in the traversal we perform a sequential scan 
of the labels and a binary search on the branching character. If the pattern has length p, we 
can never see more than p special characters during the scan. Hence if we assume constant-time 
FindClose and Elias-Fano retrieval the total number of operations is 0{p + h\og |S|), while the 
number of random memory accesses is bounded by 0(h), where h is the height of the path 
decomposition tree. The Access is symmetric except that the binary search is not needed and 
p > h, so the number of operations is bounded by 0(p) where p is the length of the returned 
string. Again, the number of random memory accesses is bounded by 0(h). 
Labels encoding and compression. As previously mentioned, we need only to scan sequen- 
tially the label of each node, so we can use any encoding that supports sequential scan with a 
constant amount of work per character. In the uncompressed trie, as a baseline, we simply use 
a vbyte encoding |34| . Since most bytes in the datasets do not exceed 127 as a value, there is 
no noticeable space overhead. For a less sparse alphabet, more sophisticated encodings can be 
used. 

The freedom in choosing the encoding allows us to explore other trade-offs. We take advan- 
tage of this to compress the labels, with an almost negligible overhead in the operations. 

We adopt a simple dictionary compression scheme for the labels: we choose a static dictionary 
of variable-sized words (that can be drawn from any alphabet) that will be stored along the 
tree explicitly, such that the overall size of the dictionary is bounded by a given parameter 
(constant) D. The node labels are then parsed into words of the dictionary, and the words are 
sorted according to their frequency in the parsing: a code is assigned to each word in decreasing 
order of frequency, so that more frequent words have smaller codes. The codes are then encoded 
using some variable-length integer encoding; we use vbyte to favor performance. To decompress 
the label, we scan the codes and for each code we scan the word in the dictionary, hence each 
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character requires a constant amount of work. 

We remark that the decompression algorithm is completely agnostic of how the dictionary 
was chosen and how the strings are parsed. For example, domain knowledge about the data 
could be exploited; in texts, the most frequent words would probably be a good choice. 

Since we are looking for a general-purpose scheme, we used a modified version of the ap- 
proximate Re-Pair [24] described in [14] : we initialize the dictionary to the alphabet E and 
scan the string to find the k most frequent pairs of codes. Then we select all the pairs whose 
corresponding substrings fit in the dictionary and substitute them in the sequence. We then 
iterate until the dictionary is filled (or there are no more repeated pairs). From this we obtain 
simultaneously the dictionary and the parsing. To allow the labels to be accessed independently, 
we take care that no pairs are formed on label boundaries, as done in [10J. 

Note that in principle our dictionary representation is less space-efficient than plain Re-Pair, 
where the words are represented recursively as pairing rules. However accessing a single character 
from a recursive rule has a cost dependent on the rule tree height, so it would fail our requirement 
of constant amount of work per decoded character. 

Implementation notes. For the BP vector we use the Range Min tree described in Section UJ 
Rank is supported using the rank 9 structure described in [33J, while Select is implemented 
through a one-level hinted binary search. The search for the branching character is replaced 
by a linear search, which for the cardinalities considered is actually faster in practice. The 
dictionary is represented as the concatenation of the words encoded in 16-bit characters to fit 
the larger alphabet E' = [0,511). The dictionary size bound D is chosen to be 2 16 , so that the 
word endpoints can be encoded in 16-bit pointers. The small size of the dictionary makes also 
more likely that (at least the most frequently accessed part of) it is kept in cache. 

3 Monotone Minimal Perfect Hash for Strings 

Minimal perfect hash functions map a set of strings S bijectively into [0, |«S|). Monotone minimal 
perfect hash functions [3] (or monotone hashes) also require that the mapping preserves the 
lexicographic order of the strings (not to be confused with generic order- preserving hashing). 
We remark that, as for standard minimal hash functions, the Lookup can return any number on 
strings outside of S, hence the data structure does not have to store the string set. 

The hollow trie [3] is a particular instance of monotone hash. It consists in a binary trie on 
S, of which only the trie topology and the skips of the internal nodes are stored, in succinct 
form. To compute the hash value of a string x, a blind search is performed: the trie is traversed 
matching only the branching characters (bits, in this case) of x. If x £ S, the leaf reached is the 
correct one, and its unique identifier in [0, |«S|) is returned; otherwise, it has the longest prefix 
match with x, useful in some applications. 

The cost of unbalancedness for hollow tries is even larger than that for normal tries: since 
the strings over E have to be converted to a binary alphabet, the height is potentially multiplied 
by 0(log |E|) with respect to that of a trie on S. The experiments in [5] show indeed that the 
data structure is not practical compared to other monotone hashes analyzed in that paper. 
Path decomposition with lexicographic order. To tackle their unbalancedness, we apply 
the centroid path decomposition idea to hollow tries. The construction presented in Section [2] 
cannot be used directly, because we want to both preserve the lexicographic ordering of the 
strings and guarantee the logarithmic height. However both the binary alphabet and the fact 
that we do not need the Access operation come to the aid. First, inspired again by [18] . we 
arrange the sub-tries in lexicographic order. This means that the sub-tries on the left of the 
path are arranged top-to-bottom, and precede all those on the right which are arranged bottom- 
to-top. In the path decomposition tree we call left children the ones corresponding to sub-tries 
hanging off the left side of the path and right children the ones corresponding to those hanging 
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on the right side. Figure [2] shows the new ordering. 
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Figure 2: Path decomposition of a hollow trie. The Si denote the skips. 



We now need a small change in the heavy path strategy: instead of breaking ties arbitrarily, 
we choose the left child. We call this strategy left-biased heavy path, which gives the following. 

OBSERVATION 3.1 Every node-to-leaf left-biased heavy path in a binary trie ends with a left 
turn. Hence, every internal node of the resulting path decomposition has at least one right child. 

Trie representation. The bitvector BP is defined as in Section [2) The label associated 
with each node is the sequence of skips interleaved with directions taken in the centroid path, 
excluding the leaf skip, as in Figure [2 Two aligned bitvectors L hlgh and L low are used to represent 
the labels using an encoding inspired by 7 codes: the skips are incremented by one (to exclude 
from the domain) and their binary representations (without the leading 1) are interleaved with 
the path directions and concatenated in L low . L hlgh consists runs of length corresponding to 
the lengths of the binary representations of the skips, followed by Is, so that the endpoints of 
(skip, direction) pair encodings in L low correspond to the Is in L hlgh . Thus a Select directory 
on L hlgh enables random access to the (skip, direction) pairs sequence. The labels of the node 
are concatenated in depth- first order: the (s in BP are in one-to-one correspondence with the 
(skip, direction) pairs. 

Trie operations. As in Section [2j a trie traversal is simulated on the path decomposition tree. 
In the root node, the (skip, direction) pairs sequence is scanned (through L hlgh and L low ): during 
the scan the number of left and right children passed by is kept; when a mismatch in the string 
is found, the search proceeds in the corresponding child. Because of the ordering of the children, 
if the mismatch leads to a left child the child index is the number of left children seen in the 
scan, while if it leads to a right child it is the node degrees minus the number of right children 
seen (because the latter are stored from right to left). The search proceeds recursively until the 
string's characters are consumed. 

When the search ends, the depth-first order of the node found is not yet the number we are 
looking for: all the ancestors where we turned left come before the found node in depth-first but 
after in the lexicographic order. Besides, if the found node is not a leaf, all the strings in the left 
sub-tries of the corresponding path are lexicographically smaller than the current string. It is 
easy to fix these issues: during the traversal we can count the number of left turns and subtract 
that from the final index. To account for the left sub-tries, using Observation 13.11 we can count 
the number of their leaves by jumping to the first right child with a FindClose: the number of 
nodes skipped in the jump is equal to the number of leaves in the left sub-tries of the node. 
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Time complexity. The running time of the Lookup can be analyzed with a similar argument 
to that of the Lookup of Section [2j during the scan there cannot be more skips than the pattern 
length; besides there is no binary search. Hence the number of operations is 0(mm(p, h)), while 
the number of random memory accesses is bounded by 0(h). 

Implementation notes. To support the Select on L hlgh we use a variant of the darray |27j : 
since the Is in the sequence are at most 64 bits apart, we can bound the size of the blocks so 
that we do not need the overflow vector (called Si in [27J). 

4 Balanced Parentheses: The Range Min Tree 

In this section we describe the data structure supporting FindClose and FindOpen. As it is not 
restricted to tries, we believe it is of independent interest. 

We begin by discussing the Range Min-Max tree [28J, which is a succinct data structure to 
support operations on balanced parentheses in O(logn) time. It was shown in [3] that it is very 
efficient in practice. Specifically, it is a data structure on { — 1,0, +1} sequences that supports 
the forward search FwdSearch(i, x): given a position i and a target value x, return the first 
position j > i such that the sum of the values in the sequence between i and j is equal to x. The 
application to balanced parentheses is straightforward: if the sequence takes value +1 on open 
parentheses on —1 on closed parentheses, FindClose(i) = FwdSearch(i, 0). In other words, it 
is the first position with zero excess, defined as the difference between the number of open and 
close parentheses up to the given position. Backwards search is defined similarly for FindOpen. 

The data structure is defined as follows: the sequence is divided in blocks of the same size 
and a tree is formed over the blocks, storing the minimum mi and the maximum Mi of the 
sequence cumulative sum (the excess, for balanced parentheses) for each block i in the leaves, 
and for the sub-trees in the nodes. The forward search traverses the tree to find the first block j 
after i where the target value x is between rrij and Mj. Since the sequence is { — 1, 0, +1}, block 
j contains all the intermediate values between rrij and Mj, and so it must contain x. A linear 
search is then performed within the block (usually through lookup tables). 

We fit the above data structure to support only FindOpen and FindClose, thus reducing 
both the space requirement and the time performance. We list our two modifications. 
Halving the tree space. We discard the maxima and store only the minima, and call the 
resulting tree Range Min tree. During the block search, we only check that target value is greater 
than the block minimum. The following lemma guarantees that the forward search is correct. A 
symmetric argument holds for the backwards search. 

LEMMA 4.1 In a balanced parentheses sequence, the Range Min tree forward search for x = 
finds the same block as the Range Min-Max tree. 

Proof Since the Min search is a relaxation of the Min-Max search, the block j' returned by the 
search in the Min tree must precede the block j found by Min-Max search, i.e. j' < j. Suppose 
by contradiction that j' < j. Since the sequence of parentheses is balanced, all the positions 
between two mates have excess greater than the excess of the opening parenthesis. Then My is 
greater than the excess of the opening parenthesis, which is the target value. Hence j' is a valid 
block for the forward search in the Min-Max tree, but since j' < j we have a contradiction. ■ 

Broadword in-block search. The in-block search performance is crucial as it is the inner loop 
of the search. In practical implementations it is usually performed byte-by-byte with a lookup 
table that contains the solution for each possible byte and excess. This involves many branches 
and accesses to a fairly big lookup tables for each byte. Supposing instead that we know which 
byte contains the closing parenthesis, we can then use the lookup table only on that byte. 
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To find that byte we can use the same trick as the Range Min: the first byte with min-excess 
smaller than the target excess contains the closing parenthesis. We find it with an hybrid lookup 
table/broadword approach. 

We divide the block into machine words. For each word w we compute the word m,g where 
the i-th byte contains the min-excess of the i-th byte in w with inverted sign, so that it is non- 
negative. This is achieved through a pre-computed lookup table which contains the min-excess 
for each possible byte. At the same time we compute the byte counts eg of w, where the i-th 
byte contains the number of Is of the i-th byte of w, using the algorithm described in [22J. 

Using the equality Excess(i) = 2 • Rank((i) — i we can easily compute the excess for each 
byte of w: if e w is the excess at the starting position of w, the word eg whose i-th byte contains 
the excess of the i-th byte of w can be obtained through the following formulaic 

e 8 = (e w + ((2 * c 8 - Ox. .. 08080808) « 8)) * Ox ... 01010101. 

Now we have all we need: the closing parenthesis is in the byte where the excess function 
crosses the zero, in other words in the byte whose excess added to the min-excess is smaller 
than zero. Hence we are looking for the first byte position in which es is smaller than mg (recall 
that the bytes in mg are negated). This can be done using the <g operation described in |22| 
to compute a mask ig = eg <g mg, where the i-th byte is 1 if and only if the i-th byte of eg 
is smaller than the i-th byte of mg. If the Zg is zero, the word does not contain the closing 
parenthesis; otherwise, an LSB operation quickly returns the index of the byte containing the 
solution. The same algorithm can be applied symmetrically for the FindOpen. 

All in all, we performed 8 lookups from a very small table, a few arithmetic operations and 
one single branch (to check whether the word contains the solution or not). In our experiments, 
the approach described above results in ~ 30% faster operations in tree-traversal benchmarks, 
with respect to byte-by-byte search. A similar improvement occurs in our trie implementations. 

5 Experimental Analysis 

In this section we discuss a series of experiments we performed on both real-world and synthetic 
data. We performed several tests both to collect statistics that show how our path decomposi- 
tions give an algorithmic advantage over standard tries, and to benchmark the implementations 
comparing them with other practical data structures. 

Setting. The experiments were run on a 64-bit 2.53GHz Core i7 processor with 8MB L3 cache 
and 24GB RAM, running Windows Server 2008 R2. All the C++ code was compiled with MSVC 
10, while for Java we used the Sun JVM 6. 
Datasets. The tests were run on the following datasets. 

• enwiki-titles (163MiB, 8.5M strings): All the page titles from English Wikipedia. 

• aol-queries (224MiB, 10. 2M strings): The queries in the AOL 2006 query log [2j. 

• uk-2002 (1.3GiB, 18. 5M strings): The URLs of a 2002 crawl of the . uk domain [9j. 

• webbase-20 01 (6.6GiB, 114.3M strings): The URLs in the Stanford WebBase from [23]. 

• synthetic (1.4GiB, 2.5M strings): The set of strings d*c J b*(Ti . . . u\~ where i and j range 
in [0,500), t ranges in [0,10), cij are all distinct (but equal for each string) and k = 100. 
The resulting tries are very unbalanced, while the constant suffix o\ . . . a\. stresses the in- 
bucket search in bucketed front coding and the redundancy is not exploited by front coding 
and non-compressed tries. At the same time, the strings are extremely compressible. 

3 A subtlety is needed here for a correct search: the excess can be negative, hence the carry in the subtraction 
corrupts the bytes after the first byte that contains the zero. However, this means that the word contains the 
solution, and the closing parenthesis is in the byte that precedes the one where the sampled excess goes negative. 
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enwiki-titles aol-queries uk-2002 webbase-2001 synthetic 



Compacted trie avg. height 
Lex. avg. height 
Centroid avg. height 



9.8 
8.7 
5.2 



11.0 
9.9 
5.2 



16.5 
14.0 
5.9 



18.1 
15.2 
6.2 



504.4 
503.5 
2.8 



Hollow avg. height 
Centroid hollow avg. height 



49.7 
7.9 



50.8 
8.0 



55.3 
8.4 



67.3 
9.2 



1005.3 
2.8 



Table 1: Average height: for tries the average height of the leaves is considered, while for path- 
decomposed tries all the nodes are considered (see the comments after Observation 12 . 2p . 

Average height. Table [1] compares the average height of plain tries with their path decompo- 
sition trees. In all the real-world datasets the centroid path decomposition cause a ~ 2-3 times 
reduction in height compared to the standard compacted trie. The gap is even more dramatic 
in hollow tries, where the binarization of the strings causes a blow-up in height close to log |E|, 
while the centroid path-decomposed tree height is very small, actually much smaller than log \S\. 
It is interesting to note that even if the lexicographic path decomposition is unbalanced, it still 
improves on the trie, due to the higher fan-out of the internal nodes. 

The synthetic dataset is a pathological case for tries, but the centroid path-decomposition 
still maintains an extremely low average height. 

String dictionary data structures. We compared the performance of our implementations of 
path-decomposed tries to other data structures. Centroid and Centroid compr. implement the 
centroid path-decomposed trie described in Section [2j in the versions without and with labels 
compression. Likewise, Lex. and Lex. compr. implement the lexicographic version. 

Re-Pair and HTFC are respectively the Re-Pair and Hu- Tucker compressed Front Coding 
from [10J. For HTFC we chose bucket size 8 as the best space/time trade-off. Comparison with 
Front Coding is of particular interest as it is one of the data structures generally preferred by 
the practitioners. 

TX is a popular open-source straightforward implementation of a (non-compacted) trie that 
uses LOUDS [21J to represent the tree. The code can be downloaded from |32| . We made some 
slight changes to avoid keeping all the string set in memory during the construction 

To measure the running times, we chose 1 million random (and randomly shuffled) strings 
from each dataset for the Lookup and 1 million random indices for the Access. Each test was 
averaged on 10 runs. The construction time was averaged on 3 runs. 

Re-Pair, HTFC and TX do not support files bigger than 2GiB, so we could not run the tests 
on webbase-2 001. Also Re-Pair did not complete the construction on synthetic in 6 hours, 
so we had to kill the process. 

String dictionaries results. The results of the tests can be seen in Table [2j On all datasets 
our compressed tries obtain the smaller space, except on uk-2 002 where they come a close 
second. The centroid versions have also the fastest Lookup times, while the Access time is 
better for Re-Pair and occasionally HTFC, whose time is although within 20% of that of the 
centroid trie. TX is consistently the largest and slowest on all the datasets. 

Maybe surprisingly, the lexicographic trie is not much slower than the centroid trie for both 
Lookup and Access. However in the synthetic dataset the unbalanced tries are more than 20 
times slower than the balanced ones. HTFC exhibits a less dramatic slowdown but still in the 
order of 5x on lookup compared to the centroid trie. Although this behavior does not occur on 
our real-world datasets, it shows that no assumptions can be made for unbalanced tries. For 
example in an adversarial environment an attacker could exploit this weakness to perform a 
denial of service attack. 

We remark that the labels compression adds an almost negligible overhead in both Lookup 
and Access, due to the extremely simple dictionary scheme, while obtaining a very good compres- 
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2.4 


2.4 
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55.6% 
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2.6 


2.3 


22.4% 3.4 


4.2 


2.2 


24.3% 


4.3 


5.0 


8.4 


17.9% 


5.1 


13.4 


Lex. 
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3.1 


3.2 


2.2 


55.0% 
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3.5 
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22.3% 5.5 
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2.6 


24.3% 


7.0 


7.4 
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119.8 


114.6 
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bps 
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Ccntroid hollow 


1.1 


8.40 


2.7 




1.2 


8.73 


2.8 




1.5 


8.17 3.3 




1.5 


8.02 


4.4 




8.6 


9.96 


11.1 




Hollow 


1.3 


7.72 


6.8 




1.3 


8.05 


7.2 




1.7 


7.48 9.3 




1.7 


7.33 


13.9 




9.5 


9.02 


137.1 




Hollow (Sux [30]) 


0.9 


7.66 


14.6 




1.0 


7.99 


16.6 




1.1 


7.42 18.5 




0.9 


7.27 


22.4 




4.3 


6.77 


462.7 




PaCo (Sux [31711 


2.6 


8.85 


2.4 




2.9 


8.91 


3.1 




4.7 


10.65 4.3 




18.4 


9.99 


4.9 




21.3 


13.37 


51.1 





Table 2: Experimental results, bps is bits per string, ctps is the average construction time per string, 
c. ratio is the compression ratio between the data structure and the original file sizes, Ikp is the average 
Lookup time and acs the average Access time. All times are expressed in microseconds. 



sion. Hence unless the construction time is a concern (in which case other dictionary selection 
strategies can also be explored) it is always convenient to compress the labels. 
Monotone hash data structures. For monotone hashes, we compared our data structures 
with the implementations in [5J. Centroid hollow implements the centroid path-decomposed 
hollow trie described in Section [3j Hollow is a reimplementation of the hollow trie of [5] but 
using a Range Min tree in place of a pioneer-based representation. Hollow (Sux) and PaCo (Sux) 
are two implementations from |5j; the first is the hollow trie, the second an hybrid scheme: a 
Partially Compacted trie is used to partition the keys into buckets, then each bucket is hashed 
with an MWHC function. Among the structures in [5], PaCo gives the best trade-off between 
space and lookup time. The implementations are freely available as part of the Sux project |30jH 
To measure construction time and lookup time we adopted the same strategy as for string 
dictionaries. For Sux, as suggested in [5j, we performed 3 runs of lookups before measuring the 
lookup time, to let the JIT warm up and optimize the generated code. 

Monotone hash results. Table [2] shows the results for monotone hashes. On all real- world 
datasets the centroid hollow trie is ~ 2-3 times faster than our implementation of the hollow 
trie and ~ 5 times faster than the Sux implementation. The centroid hollow trie is competitive 
with PaCo on all datasets, while taking less space and with a substantially simpler construction. 
The synthetic dataset in particular triggers the pathological behavior on all the unbalanced 
structures, with Hollow, Hollow (Sux) and PaCo being respectively 13, 41 and 5 times slower 
than Centroid hollow. Such a large performance gap suggests the same conclusion reached 
for string dictionaries: if predictable performance is needed, unbalanced structures should be 
avoided. 

6 Conclusion and Future Work 

We have presented new succinct representation for tries that guarantee low average height and 
enables the compression of the labels. Our experimental analysis has shown that they obtain 
the best space occupancy when compared to the state of the art, while maintaining competi- 

4 To be fair we need to say that Sux is implemented in Java while our structures are implemented in C++. 
However, the recent developments of the Java Virtual Machine have made the abstraction penalty gap smaller 
and smaller. Low-level optimized Java (as the one in Sux) can be on par of CH — h for some tasks, and no slower 
than 50% with respect to CH — h for most other tasks |31| . We remark that the hollow trie construction is actually 
faster in the Sux version than in ours, although the algorithm is very similar. 
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tive access times. Moreover, they give the most consistent behavior among different (possibly 
synthetic) datasets. We have not considered alternatives for the dictionary selection algorithm 
in labels compression; any improvement in that direction would be beneficial to the space occu- 
pancy of our tries. We also plan to consider other kinds of path decompositions (longest path, 
ladder, etc.), which could enable other time/space/functionality trade-offs. 

Acknowledgments 

We would like to thank the authors of |10j for kindly providing the source code for their algo- 
rithms. 

References 

[1] A. Acharya, H. Zhu, and K. Shen. Adaptive algorithms for cache-efficient trie search. In 
M. T. Goodrich and C. C. McGeoch, editors, Algorithm Engineering and Experimentation, 
International Workshop ALENEX '99, Baltimore, MD, USA, January 15-16, 1999, Selected 
Papers, volume 1619 of Lecture Notes in Computer Science, pages 296-311. Springer, 1999. 

[2] AOL search data. |http : / /www . gregsadetsky . com/aol-data/[ 

[3] D. Arroyuelo, R. Canovas, G. Navarro, and K. Sadakane. Succinct trees in practice. In 
ALENEX, pages 84-97, 2010. 

[4] D. Belazzougui, P. Boldi, R. Pagh, and S. Vigna. Monotone minimal perfect hashing: 
searching a sorted table with 0(1) accesses. In SODA, pages 785-794, 2009. 

[5] D. Belazzougui, P. Boldi, R. Pagh, and S. Vigna. Theory and practise of monotone minimal 
perfect hashing. In ALENEX, pages 132-144, 2009. 

[6] M. A. Bender, M. Farach-Colton, and B. C. Kuszmaul. Cache-oblivious string B-trees. 
In S. Vansummeren, editor, Proceedings of the Twenty-Fifth ACM SIGACT-SIGMOD- 
SIGART Symposium on Principles of Database Systems, June 26-28, 2006, Chicago, Illinois, 
USA, pages 233-242. ACM, 2006. 

[7] D. Benoit, E. D. Demaine, J. I. Munro, R. Raman, V. Raman, and S. S. Rao. Representing 
trees of higher degree. Algorithmica, 43(4): 275-292, 2005. 

[8] D. K. Blandford and G. E. Blelloch. Compact dictionaries for variable-length keys and data 
with applications. ACM Transactions on Algorithms, 4(2), 2008. 

[9] P. Boldi, B. Codenotti, M. Santini, and S. Vigna. Ubicrawler: A scalable fully distributed 
web crawler. Software: Practice & Experience, 34(8):711-726, 2004. 

[10] N. R. Brisaboa, R. Canovas, F. Claude, M. A. Martinez-Prieto, and G. Navarro. Compressed 
string dictionaries. In SEA, pages 136-147, 2011. 

[11] G. S. Brodal and R. Fagerberg. Cache-oblivious string dictionaries. In SODA, pages 581-590. 
ACM Press, 2006. 

[12] S.-Y. Chiu, W.-K. Hon, R. Shah, and J. S. Vitter. I/O-efficient compressed text indexes: 
From theory to practice. In J. A. Storer and M. W. Marcellin, editors, 2010 Data Com- 
pression Conference (DCC 2010), 24-26 March 2010, Snowbird, UT, USA, pages 426-434. 
IEEE Computer Society, 2010. 

[13] D. R. Clark. Compact pat trees. PhD thesis, University of Waterloo, Waterloo, Ont., 
Canada, Canada, 1998. UMI Order No. GAXNQ-21335. 



11 



[14] F. Claude and G. Navarro. Fast and compact web graph representations. TWEB, 4(4), 
2010. 

[15] P. Elias. Efficient storage and retrieval by content and address of static files. Journal of the 
ACM (J ACM), 21(2):246-260, 1974. 

[16] R. Fano. On the number of bits required to implement an associative memory. Memorandum 
61. Computer Structures Group, Project MAC, MIT, Cambridge, Mass., nd, 1971. 

[17] P. Ferragina and R. Grossi. The string B-tree: A new data structure for string search in 
external memory and its applications. J. ACM, 46(2):236-280, 1999. 

[18] P. Ferragina, R. Grossi, A. Gupta, R. Shah, and J. S. Vitter. On searching compressed 
string collections cache-obliviously. In PODS, pages 181-190, 2008. 

[19] P. Ferragina, F. Luccio, G. Manzini, and S. Muthukrishnan. Compressing and indexing 
labeled trees, with applications. J. ACM, 57(1), 2009. 

[20] R. Grossi and J. S. Vitter. Compressed suffix arrays and suffix trees with applications to 
text indexing and string matching. SIAM J. Comput., 35(2):378-407, 2005. 

[21] G. Jacobson. Space-efficient static trees and graphs. In FOCS, pages 549-554, 1989. 

[22] D. E. Knuth. The Art of Computer Programming, Volume 4, Fascicle 1: Bitwise Tricks & 
Techniques; Binary Decision Diagrams. Addison- Wesley Professional, 12th edition, 2009. 

[23] Laboratory for Web Algorithmics - Datasets. |http : / / law . dsi . unimi . it/datasets . php[ 

[24] N. J. Larsson and A. Moffat. Offline dictionary-based compression. In Data Compression 
Conference, pages 296-305, 1999. 

[25] J. I. Munro and V. Raman. Succinct representation of balanced parentheses, static trees 
and planar graphs. In FOCS, pages 118-126, 1997. 

[26] J. I. Munro and V. Raman. Succinct representation of balanced parentheses and static 
trees. SIAM Journal on Computing, 31(3):762-776, June 2001. 

[27] D. Okanohara and K. Sadakane. Practical entropy-compressed rank/select dictionary. In 
ALENEX, 2007. 

[28] K. Sadakane and G. Navarro. Fully-functional succinct trees. In SODA, pages 134-149, 
2010. 

[29] D. D. Sleator and R. E. Tarjan. A data structure for dynamic trees. In STOC, pages 
114-122, 1981. 

[30] Sux: Implementing Succinct Data Structures. |http : / / sux . dsi . unimi . it7| 

[31] The Computer Language Benchmarks Game. |http ://shootout.alioth. deb i an . org/[ 

[32] Tx: Succinct Trie Data structure. |http : / /code, google. com/p/tx-trie/[ 

[33] S. Vigna. Broadword implementation of rank/select queries. In WE A, pages 154-168, 2008. 

[34] H. E. Williams and J. Zobel. Compressing integers for fast file access. Comput. J., 42(3): 193- 
201, 1999. 



12 



